Identification Of Diverse Database Subsets Using Property-Based And Fragment-Based Molecular Descriptions

نویسندگان

  • Mark Ashton
  • John Barnard
  • Florence Casset
  • Michael Charlton
  • Geoffrey Downs
  • Dominique Gorse
  • John Holliday
  • Roger Lahana
  • Peter Willett
چکیده

This paper reports a comparison of calculated molecular properties and of 2D fragment bit-strings when used for the selection of structurally diverse subsets of a file of 44295 compounds. MaxMin dissimilarity-based selection and k-means clusterbased selection are used to select subsets containing between 1% and 20% of the file. Investigation of the numbers of bioactive molecules in the selected subsets suggest: that the MaxMin subsets are noticeably superior to the k-means subsets; that the property-based descriptors are marginally superior to the fragment-based descriptors; and that both approaches are noticeably superior to random selection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Molecular identification of infertile bulls by using newly developed DDX3Y based on human STS markers

To determine the role of DDX3Y gene in spermatogenesis and infertility in bulls, blood samples were collected from five infertile bulls (azoospermic; no sperm in the semen) at the Animal Breeding Center in Karaj, Iran. The recommended human primers by EAA/EQMN were investigated using the BLASTn database for STS marker detection. Alignment of STS marker genes with bovine genome was performed. Pr...

متن کامل

Molecular identification of Dicrocoelium dendriticum using 28s rDNA genomic marker and its histopathologic features in domestic animals in western Iran

Introduction: Dicrocoeliasis is a common disease of bile ducts and gallbladder of domestic and wild ruminants. This disease is caused by different species of dicrocoelium including Dicrocoelium dendriticum. The aim of this study was to identify pathological damages and molecular features associated with this parasite in ruminants. Materials and Methods: In this cross-sectional study, 180 fresh...

متن کامل

Similarity-based data mining in files of two-dimensional chemical structures using fingerprint measures of molecular resemblance

This paper reviews the use of measures of inter-molecular similarity for processing databases of chemical structures, which play an important role in the discovery of new drugs by the pharmaceutical industry. The similarity measures considered here are based on the use of a fingerprint representation of molecular structure, where a fingerprint is a vector encoding the presence of fragment subst...

متن کامل

Identification of Candida species isolated from vulvovaginal candidiasis using PCR-RFLP

Vulvovaginal candidiasis (VVC) is a common disease among women worldwide, therefore, accurate and rapid diagnosis of causative agents based on molecular techniques utilizing amplification of target DNA is highly recomendad for epidemiological purposes and for effective treatment. The aim of this study was to identify clinically Candida species from VVC patients by restriction fragment length po...

متن کامل

Similarity and Dissimilarity Methods for Processing Chemical Structure Databases

This paper reviews measures of similarity and dissimilarity between pairs of chemical molecules and the use of such measures for processing chemical databases. The applications discussed include similarity searching, database clustering and diversity analysis, focusing upon measures that are based on fragment bit-string occurrence data. The paper then discusses recent work on the calculation of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008